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Abstract 

Some  thoughts  on  large-scale  computer-aided  statistical  mathematics  (primarily 
simulation)  which  were  presented  at  the  6th  Annual  Conference  on  the  Computer 
Science/Statistics  Interface  conference  are  presented.   Comments  of  participants 
and  panelists  (D.  F.  Andrews,  J.  N.  Arvesen,  D.  P.  Gaver ,  and  G.  Marsaglia)  have 
been  added  to  the  original  text. 


1.   INTRODUCTION 

The  aim  of  this  paper  is  to  stimulate  discussion 
on  large-scale  computer-aided  statistical  mathe- 
matics (primarily  simulation)  at  this  conference. 
There  has  been  a  lot  of  discussion  of  computers 
and  statistics  (Hartley,  1972;  Milton  and  Nelder , 
1969;  Chambers,  1970),  but  little  of  large-scale 
use  of  computers  in  simulation  experiments  to 
solve  open  distributional  problems.   This  has  per- 
haps been  because  of  the  unavailability  of  large 
computers  and  large  amounts  of  computer  time  to 
research  workers  and  statisticians.   I  think  this 
will  change  rapidly  over  the  next  ten  years  as 
internal  computation  speed  and  the  size  of  random 
access  memories  go  up.   The  talk  given  by  Dr.  A. 
G.  Anderson  at  this  conference  has  amply  illus- 
trated this  trend. 

To  take  advantage  of  this  availability,  and  to  use 
the  internal  time  sharing  of  central  processing 
units  inherent  in  multiprogramming,  new  statistical 


techniques  which  are  computation-oriented  will  have 
to  be  developed.   There  is  already  growing  impetus 
in  this  direction  and  this  new  technology  coupled 
to  the  computers  will  make  an  enormous  impact  on 
statistics.   I  should  note,  too,  that  large-scale 
simulations  are  commonplace  in  industry  and  devel- 
opment laboratories,  but  the  inefficiency  of  most 
of  these  computations  is  appalling. 

There  are  recent  surveys  of  several  aspects  of 
statistical  computing  (Hemmerle,  1967;  Halton, 
1970;  Chambers,  1970;  Freiberger  and  Grenander, 
1971),  most  notably  that  by  Tukey  (1972b)  who  has 
been  responsible  for  many  of  the  new  ideas  in  sta- 
tistical computation.   Consequently,  I  will  only 
describe  here  the  evolution  of  a  computer  program 
called  COMPSTAT  which  was  developed  to  try  to  use 
the  IBM  360/91  computer  at  the  IBM  Research  Center 
as  efficiently  as  possible.   The  problems  encoun- 
tered in  developing  this  program,  some  solved  but 
others  open,  are  more  than  enough  for  one  paper. 
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My  interest  in  the  problem  of  large-scale 
statistical  computation  grew  from  the  frustration 
of  trying  to  deal  with  non-normal  time  series,  in 
particular,  point  processes  (Cox  and  Lewis,  1966), 
and  of  having  to  write  a  book  around  the  many  gaps 
in  the  distribution  theory.   The  first  problem  I 
tackled  was  the  distribution  of  product-moment 
statistics  (Lewis  and  Goodman,  1970),  since  one 
can,  in  principle,  find  recursion  relationships 
to  generate  the  distribution  for  successive  sample 
sizes   n.   It  took  six  months  to  verify  the 
mathematics,  six  months  to  program  it,  and  even 
then  I  wasn't  sure  enough  of  the  programming  to 
publish  the  results.   I  then  turned  to  simulation, 
and  quickly  ran  into  several  equally  frustrating 
problems : 

(1)  Many  procedures,  notably  variance  re- 
duction techniques,  were  very  particular 
to  the  problem  at  hand  and  difficult  to 
generalize.   For  example,  a  technique 
which  works  in  estimating  the  mean  of  a 
distribution  may  not  work  when  one  is 
also  interested  in  estimating  the  var- 
iance or  higher  moments. 

(2)  Most  published  statistical  estimation 
(point  and  interval)  techniques  were 
"valid"  asymptotically,  and  were  pro- 
hibitively expensive  in  terms  of  number 
of  operations  (addition  and  multipli- 
cation) and  memory  cells  required. 

(3)  Most  "canned"  routines  were  slow  and 
generally  unreliable. 

(4)  "Tooling  up"  took  an  excessive  amount 
of  time,  and  storing  results,  tabu- 
lating results  and  manipulating  results 
was  difficult. 

It  was  therefore  decided  to  look  into  the  proce- 
dures and  algorithms  available,  program  them 
efficiently  if  they  were  useful,  develop  new 
techniques  which  were  fast  and  economical  of 
storage  where  necessary,  and  put  them  into  a 
standard  program  which  could  be  used  for  large 
scale  simulations. 


Several  guidelines  were  set: 

a)  All  procedures  were  to  be  computationally 
simple,  use  as  little  memory  as  possible 
and  to  be  as  broadly  applicable  as  pos- 
sible.  In  particular,  this  meant  they 
should  use  as  little  information  as 
possible  about  the  statistic,  say   S,  to 
be  simulated.   For  example,  one  might  not 
want  to  use  the  specific  information  that 
S  was  positive. 

b)  To  utilize  the  speed  of  the  computers, 
the  best  way  seemed  to  be  to  compute  the 
distributions  of  as  many  statistics  as 
possible  simultaneously. 

c)  Memory  requirements  should  be  kept  fixed 
and  relatively  small  in  order  to  use 
excess  CPU  (Central  Processing  Unit) 
time  by  running  in  a  lowest  priority  par- 
tition in  a  multiprogrammed  invironment. 

A  block  diagram  of  the  overall  program,  COMPSTAT, 
which  was  developed  is  shown  in  Figure  1.   We  dis- 
cuss this  program  generally  before  going  into 
details  of  implementation  and  unsolved  problems 
in  later  sections.  • 

Referring  to  Figure  1,  the  symbol   n   is  used  to 
refer  to  sample  size  in  statistical  simulations, 
so  that  the  statistic  S  might  be  the  average 
of  the  observations  in  a  random  sample  of  size   n. 
Any  value  of   n  may  be  used  in  the  program, 
though  it  is  written  so  as  to  repeat  simulations 
on  successive  values  of   n   if  required.   A  number 
m  of  replications  is  specified  by  the  user,  with 
the  option  of  splitting  m  into   r  blocks  of 
size  m'   each   (m  =  rm1).   This  is  done  to  obtain 
estimates  of  the  variance  of  estimates  and  also 
to  allow  for  checkpoints  to  be  taken. 

On  each  replication  the  STATISTICS  GENERATOR  can 
call  for   £  ^  n  random  numbers,  unsorted  or 
sorted  by  magnitude.   The  user  writes  the  STA- 
TISTICS GENERATOR,  specifying  up  to  32  statistics 
(functions  of  the  I      random  variates) .   This 
has  proved  to  be  a  very  flexible  arrangement;  the 
statistics  could  be,  for  example, 
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Figure  I. 


a)  the  sample  serial  correlations  of  lags   1 
to   32   in  a  series  of  random  variables 
of  length  n; 

b)  the  waiting  times  of   32  successive  cus- 
tomers in  a  simulated  queue; 

c)  an  estimate  of  a  parameter  in  a  distribu- 
tion, the  jackknifed  estimate  of  the 
parameter,  the  jackknifed  variance,  and 
the  pseudo-values ; 

d)  32  points  in  the  simulated  spectrum  of  a 
time  series  of  length   n. 


There  are  many  other  possibilities.   For  each  of 
these  statistics  the  user  can  specify  that  he  wants 
estimates  of  the  first  four  moments  of   S,   16 
quantiles  of  the  distribution  of   S,   and  16  per- 
centiles of   S   (or  any  combination  of  these  three). 
Quantiles  here  is  used  to  mean  the  solution  xq 
of  the  equation  a  =  F  (x  ) ,  where  a  is  given  and 
F  (x)   is  the  distribution  of   S.   A  percentile  is 
just  F(x)   for  given  x.   We  assume  the  quantile 
exists . 


Mean 

Stand.  Dev, 

O" 

Lower 

Quantiles 

P 

xo.  001 

X0,  002 

X0.  005 

xo.  010 

X0.  020 

X0.  025 

X0.  050 

xo.  100 

Normal 
(Exact) 

11. 982 

2.  562 

6.989 

7.  197 

7.  512 

7.789 

8.  11  3 

8.  229 

8,  642 

9,  165 

Exponential 

1 1. 824 
(0.  001) 

2.  827 
(0.  001) 

6,  133 
(0,  005) 

6.  378 
(0,  003) 

6.746 
(0.  003) 

7.  068 
(0,  002) 

7,  445 
(0,  002) 

7,  580 
(0. 002) 

8,  063 
(0.  002) 

8,  668 
(0.  001) 

l/2  Weibull 

213,  828 
(0.  Oil) 

Skewness 
*1 

85,  678 
(0.  033) 

Kurtosi  s 
^2 

67.  580 
(0.  067) 

72.  483 
(0.  080) 

80.  052 
(0.  082) 

Upper  Q 

87.  068 
(0.  059) 

uantiles 

95.  41  0 
(0.  032) 

98.  523 
(0.  030) 

109,  7  07 
(0,  036) 

124.  384 
(0,  032) 

xo.  900 

"o.  950 

X0,97  5 

X0.  980 

xo.  990 

X0.  995 

X0.  998 

X0.  999 

Normal 
(Exact) 

1.  442 

15.324 

U.  764 

18.  176 

18.  627 

20.  024 

21.  415 

23.251 

24.  638 

Exponential 

1.  061 
(0.  003) 

2.  120 
(0.  023) 

15,  514 
(0.  004) 

17.  064 
(0.  005) 

18.  572 
(0.  005) 

19.  061 
(0.  006) 

20.  552 
(0.  009) 

11.  045 
(0.  009) 

24.  055 
(0,  024) 

25.  557 
(0,  033 

1/2   Weibull 

1.  557 
(0,  003) 

5.  082 
(0.  035) 

323.  012 
(0.  084) 

373.  912 
(0,  059) 

425.  816 
(0.  136) 

442.  749 
(0.172) 

496,  882 
(0.  182) 

553.  934 
(0.  328) 

634.  068 
(0.  472) 

700.  534 
(1    696) 

Table  1 


Table  1  shows  the  form  chosen  to  tabulate  the 
results  (moments  and  quantiles)  of  a  simulation 
involving  m  replications  for  each  n.   These 
results  are  averages  and  sample  standard  devia- 
tions of  the  results  of  the  r  blocks  of  m1 
replications,  all  of  this  being  stored  in  an 
archive  which  Tukey  has  aptly  called  the  SIDEPUT. 
The  estimated  standard  deviations  of  the  estimates 
are  given  in  brackets  just  below  the  estimates; 
below  them  we  give  (not  shown)  the  estimated 
quantiles  after  subtraction  of  the  estimated  mean 
y   and  division  by  the  estimated  standard  devia- 
tion a.   This  allows  the  experimenter  to  judge 
whether  the  statistic  is  approximately  normally 
distributed. 

The  last  blocks  in  Figure  1  allow  for  CUMULATIVE 
TABULATION  on  n,   EDITING  and  SMOOTHING  of  the 
results  (including  rounding  and  printing  tables 
for  publication) ,  and  GRAPHICAL  OUTPUT  as  shown 
in  Figures  2  &3.   It  is  easy  to  see  in  the  figures 
that  this  statistic  is  not  normally  distributed 
and  is  converging  very  slowly  with   n   to  the 
asymptotic   (n-*°°)   distribution.   The  positive 


skewness  of  the  distribution  is  also  evident. 

An  original,  rather  inefficient,  COMPSTAT  program 
was  used  to  implement  a  study  of  tests  of  inde- 
pendence in  point  processes.   Twenty  statistics 
were  computed  simultaneously  on  an  IBM  360/91  in 
a  120K  partition.   Some  of  these  results  have  been 
published  (Lewis,  1972)  and  are  partially  repro- 
duced here  (Figures  2  and  3) ;  others  will  appear 
later. 

A  study  on  a  similar  scale  of  robust  estimates  of 
location  was  undertaken  at  Princeton  (Andrews, 
et  al,  1972);  they  had  the  advantage  over  me  of 
both  manpower  and  expertise. 

It  is  hoped  to  rewrite  the  COMPSTAT  program  at 
some  later  time  in  order  to  incorporate  all  of  the 
recent  advances  in  statistical  computing  technol- 
ogy described  below. 

2.   DETAILS 

We  discuss  now  the  details  of  the  implementation 
of  a  program  such  as  COMPSTAT.  At  its  inception 
in  1966  we  quickly  ran  up  against  the  lack  of  real 


Figure  2 


computational  considerations  in  many  standard 
statistical  procedures.   The  situation  is  better 
at  present,  with  books  such  as  Hemmerle  (1967) 
and  Knuth  (1969)  now  available.   Knuth  (1969)  in 
particular  is  invaluable.   There  are  still  many 
problems,  however,  particularly  relating  to  large- 
scale  computations. 

a)   Random  Number  Generation 

Clearly  the  statistical  quality  of  the  random 
numbers  available  for  large-scale  simulations 
will  be  the  limiting  factor  in  how  far  one  can  go 
in  utilizing  large-scale  computers  in  simulations. 
In  1966  the  main  generator  in  use  was  RANDU  in 
the  IBM  SSP  package.   It  is  still  widely  used  to- 
day, by  default,  even  though  it  is  known  to 


knowledgeable  users  to  have  poor  statistical  pro- 
perties.  There  are  no  published  test  results  on 
RANDU,  except  one  brought  to  my  attention  at  the 
conference  (Bates  and  Zirkle,  1971)  but  there  are 
papers  published  on  problems  which  have  been  en- 
countered with  its  use.   Moreover,  as  a  statistical 
consultant  one  comes  up  against  many  cases  in  which 
strange  results  in  simulations  are  remedied  by  re- 
placing RANDU  by  another  random  number  generator. 

In  this  respect  it  might  be  noted  that  if  statis- 
ticians are  guilty  of  ignoring  computational 
aspects  of  their  procedures,  computer  scientists 
are  equally  guilty  of  ignoring  the  statistical 
aspects  of  algorithms.   There  are  hundreds  of 
clever  random  number  algorithms  in  the  literature 
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Figure  3 


[see  the  bibliography  by  Nance  and  Overstreet, 
1972]  but  there  are  virtually  no  accompanying  test 
results  published.  Moreover,  it  is  generally  dif- 
ficult to  do  so  in  a  computer  journal. 

In  1966  we  started  to  investigate  the  problem  of 
random  number  generation  for  large-scale  computa- 
tion, and  after  extensive  testing  we  developed 
a  31-bit  pseudo-random  number  generator  for  the 
System  360.   This  is  a  multiplicative  congruential 
generator  of  the  form  (Lewis,  Goodman,  and  Miller, 
1969) 

xi+1  =  Ax±   (mod  p) , 

31  5 

where  p ,  a  prime ,  is   2   -  1  and  A  =  7   is  a 


positive  primitive  root  of  p,  thus  guaranteeing 
a  cycle  of  length  p   for  the  generator.   Another 
advantage  of  using  a  positive  primitive  root  for 
the  multplier  is  that  low  order  bits  are  also 
random.   Beside  the  assembly  language  version  given 
in  the  paper,  a  very  fast  version  of  this 
generator  which  generates  arrays  of  integer  or 
floating  point  numbers  is  available  in  the  new  IBM 
SL/MATH  package.   We  will  refer  to  this  as  the  GGL 
generator;  it  generates  a  random  number  in  1.40 
Usees  on  a  360/91  and  in  16.00  usees  on  a  360/67. 
A  version  has  been  written  for  the  360/67  at  the 
Naval  Postgraduate  School  using  a  division  simu- 
lation algorithm  due  to  Lehmer  (see  Payne,  Rabung , 
Bagyo,  1969;  Liniger,  1961). 


Test  results  for  GGL  are  given  in  Lewis,  Goodman, 
and  Miller  (1969)  and  extensive  subsequent  use 
turned  up  no  obvious  problems,  though  constant 
care  was  exercised.   For  example,  in  Table  1  the 
maximum  periodogram  values  with  1/2-Weibull  dis- 
tributed variates  (squares  of  exponentially  dis- 
tributed variates)  are  very  large.   In  this  simu- 
lation  r  =  6,  m'  =  750,000  and  m  =  4,500,000. 
As  a  check,  normal  deviates  were  used  in  a  very 
large  simulation  and  no  discrepancy  from  the 
exact  distribution  (shown  in  Table  1)  even  at 
0.999   quantiles  was  found.   This  GGL  random 
number  generator  was  used  in  the  Princeton  study 
(Andrews,  et  al ,  1972)  and  is  used  in  APL. 

Nevertheless,  valid  doubts  continue  to  be  expres- 
sed about  the  use  of  the  GGL  generator  in  large- 
scale  computations,  particularly  in  light  of 
Marsaglia's  results  (Marsaglia,  1968,  1972)  on  the 
structure  of  sequences  of  numbers  from  congruen- 
tial  generators.   These  results  have  shed  a  lot 
of  light  on  problems  which  can  be  encountered 
with  congruential  generators,  but  I  don't  be- 
lieve they  tell  the  whole  story.   There  are  also 
new  types  of  generators  being  advocated.   Some 
of  these  are  too  cumbersome  for  consideration, 
but  others  are  popular,  in  particular  the  Taus- 
worthe  or  shift  register  generators  (Tausworthe, 
1965).   Some  doubt  has  been  cast  on  the  statis- 
tical properties  of  these  shift  register  pseudo- 
random number  generators  recently;  my  own 
preference  is,  for  speed  and  simplicity,  to  go  to 
shuffled  congruential  generators  (Marsaglia  and 
Bray,  1968).   Tukey  (1972)  ascribes  this  idea 
to  Gentlemen  but  it  seems  quite  old  and  was  put 
forward  by  Marsaglia  in  the  early  1960's.   I 
have  found  no  documentation  of  the  statistical 
properties  of  shuffled  generators,  although  they 
are  intuitively  appealing. 

We  have  undertaken  further  statistical  tests  of 
some  of  the  above  generators  at  the  Naval  Post- 
graduate School.  In  particular,  we  have  been 
interested  in  correlating  test  results  with 

*  I  am  indebted  to  Dr.  L.  R.  Turner,  NASA 
for  these  numbers. 


results  of  Couveyou  and  MacPherson,  1967;  (see 
also  Knuth,  1969,  pp.  82-100.)  and  Marsaglia,  1972. 
It  seems  to  me  that  the  Couveyou-MacPherson  Fourier 
analysis  is  the  best  analytical  tool  for  predicting 
performance  of  random  number  generators  that  has 
appeared.   Marsaglia's  results  [1972]  on  the  lat- 
tice structure  of  the  congruential  generators  are 
also  useful. 

The  tests  referred  to  started  with  the  GGL  random 
number  generator,  RANDU  and  a  TAUSWORTHE  generator 
(Tausworthe,  1965)  and  the  runs  test.   As  in  Lewis, 
Goodman,  and  Miller  (1969),  runs  of  length  eight 
or  longer  are  pooled  and  a  Chi-square  statistic 
computed.   Nominally,  this  has  a  Chi-square  dis- 
tribution with  7  degrees  of  freedom  and  we  denote 

2 
it  as   x7-   Table  2  gives  summary  statistics  on 

16 
20  runs  of   2    numbers  each  for  the  three  gener- 
ators.  The  Tausworthe  generator  is  now  analytic- 
ally known  to  have  poor  runs  performance  (Toothill, 
Robinson,  and  Adams,  1971).   The  runs  test  rejects 
neither  GGL  nor  RANDU  if  the  test  statistic  is  as- 
sumed to  be  distributed  as  a  Chi-square  variate 
with  7  degrees  of  freedom.   As  mentioned  before, 
RANDU  is  known  to  be  poor;  this  is  shown  in  Table 
3  giving  the  Couveyou-MacPherson  wave  numbers  for 
GGL  and  RANDU  for  dimensions  up  to  7.   It  is  in 
higher  dimensions  that  RANDU  is  particularly  poor.* 

Table  2.   Runs  test;  Chi-square  statistic. 


GGL 

RANDU 

TAUSWORTHE 

Table  3.   Wave  numbers  for  two  generators.* 

GGL  RANDU 

Dimension 

2  16,807  23,172 

3  638.9  10.86 

4  146.25  10.77 

5  67.21  10.77 

6  29.92 

7  16.55 

Lewis  Research  Center,  Cleveland,  Ohio, 


A7 

x2 
7 

6.846 

4.084 

7.939 

3.502 

9.972 

10.428 

A  comparison  of  the  three  samples  of  size  20  using 
a  two-sample  Kolmogorov  test  rejects  the  hypothesis 
that  any  of  them  are  from  the  same  distribution! 
This  is  a  sad  result  for  large-scale  simulation, 
particularly  if  one  were  trying  to  simulate  the 

distribution  of  the  Chi-square  summary  statistic, 

2 
X7,   for  the  runs  test. 

The  three  generators  have  been  shuffled  and  the 
Chi-square  statistics  of  100  tests  for  samples 
of  size  2    numbers  from  each  generator  were 
found  to  be  distributionally  commensurate.   The 
shuffled  Tausworthe  generator  was  still  suspect, 
however. 

Results  of  this  testing  will  be  given  elsewhere; 
other  statistical  tests  are  being  evaluated.   The 
conclusions  so  far  are  interesting.   There  is 
mild  evidence  that  shuffling  helps.   The  main  con- 
clusion seems  to  be,  however,  that  the  runs  test 
is  virtually  useless.   (Note  that  Bates  and  Zirkle 
accept  RANDU,  partly  on  the  basis  of  runs  tests.) 
And  although  recent  books  (Newman  and  Odell ,  1971; 
Maisel  and  Gnugnoli ,  1972)  tout  the  runs  test  as 
amongst  the  best,  I  have  been  unable  to  find  any 
documentation  for  this.   It  seems  to  be  an  example 
of  a  stochastic  rumor  to  which  I  too  have  contri- 
buted (Lewis,  Goodman,  and  Miller,  1969).   Perhaps 
some  readers  can  guide  me  to  work  on  the  power 
of  the  runs  test;  Lehman  (1959,  p.  155)  points 
out  that  a  modified  runs  test  has  certain  optimum 
properties  in  testing  for  independence  in  a  binary 
sequence  against  lst-order  Markov  alternatives. 
Even  results  on  the  power  of  the  runs  test  re- 
lative to  the  serial  correlation  test  for  first 
order  normal  autoregressive  schemes  would  be  of 
interest . 

There  is  clearly  much  work  to  be  done  in  random 
number  generation.   I  have  also  not  mentioned 
the  need  for  efficiently  generated,  reliable  nor- 
mally distributed  random  variates.   These,  and 
perhaps  several  other  kinds  of  random  deviates 
should  be  provided  as  primitives,  just  as  random 
numbers  are  provided  as  primitives  in  APL.   A 
package  to  generate  normally  and  exponentially 
distributed  random  variables  is  available  from 


Marsaglia  at  McGill  University.   It  uses  some  of 
Marsaglia's  own  methods  and  is  very  fast.   A  survey 
of  some  of  these  methods  is  given  by  Ahrens  and 
Dieter  (1972). 

b)   Ordering. 

Ordering  (sorting)  of  quantities,  or  obtaining 
ranks,  is  a  basic  operation  in  statistical  compu- 
tation; a  survey  is  given  by  Martin  (1971).   The 
main  use  we  had  for  it  initially  was  in  quantile 
estimation,  and  here  it  was  a  bottleneck  since, 
in  general,  ordering  of   n  quantities  takes  a 
number  of  operations  proportional  to  n(£n  n)   and 
memory  capacity  proportional  to  n.   The  quantile 
estimation  problem  is  discussed  below;  the  use  of 
ordering  is  now  used  mainly  in  COMPSTAT  in  gener- 
ating statistics  such  as  the  median.   There  are 
several  points  to  be  made  here. 

(i)   Uniformly  distributed  random  variates 
can  be  ordered  by  address  modification  schemes 
(Isaac  and  Singleton,  1956)  in  time  propor- 
tional to  n,   although  for  large  computations 
3n  memory  positions  are  needed  to  avoid  over- 
flows.  An  algorithm  for  this  type  of  sorting 
is  provided  in  COMPSTAT. 

(ii)   It  is  clear  that  by  using  pilot  estimates 
of  a  non-uniform  distribution,  address  modi- 
fication schemes  can  be  used  on  any  data. 
These  take,  asymptotically,   n   operations, 
but  for  reasonable  sample  sizes  the  procedure 
is  slow  and  cumbersome  programming-wise.   The 
scheme  is  due  to  Floyd  at  Stanford.   There  is 
renewed  interest  in  this  area  and  Chambers 
(1971)  has  a  scheme  for  partial  sorting  which 
is  more  efficient  than  an  n(£n  n)   sort. 
Andrews  (personal  communication)  also  has  a 
scheme  for  obtaining  the  median;  it  uses  a 
pilot  estimation  scheme  and  is  subject  to 
overflows  which  could  be  a  problem  in  large- 
scale  simulations. 

(iii)   Schemes  using  the  Markov  property  of 
the  gaps  (differences  between  successive  order 
statistics)  (see  David,  1971,  p.  17)  are 
available  for  producing  ordered  uniform  var- 
iates (Schucany,  1972;  Lurie  and  Hartley,  1972) 


We  had  tried  in  COMPSTAT  an  equivalent  scheme 
based  on  the  independence  of  the  gap  statis- 
tics for  exponentially  distributed  variates. 
These  schemes  for  moderate   n   are  more  time 
consuming  than  the   n(log  n)   schemes,  but  are 
more  efficient  in  use  of  memory  space.   Their 
primary  use  would  seem  to  be  when  only  a  few 
of  the  low  or  high  order  statistics  are  needed. 
Two  points  should  be  made  here: 

1.   It  is  computationally  easier  to  generate 
high  order  statistics,  rather  than  the  low 
order  statistics  advocated  by  Lurie  and 
Hartley  (1972)  and  Shucany  (1972).   Denoting 
the  uniform  variates  by   U.   and  the  ordered 


uniform  variates  by  U  , .  ,  we  have 
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If  only  low  order  uniform  order  statistics  are 
required,  they  are  generated  as  U  J. . .  =  1  - 

U(n+l-i)' 

2.   The  time  consuming  operation  in  the  above 
is  to  take  the  (l/i)th  power.   This  is  done, 
usually,  using  logarithms  and  the  scheme  is 
then  equivalent  to  generating  order  statistics 
from  a  unit  exponential  distribution.   However, 
since  it  is  much  faster  to  generate  exponen- 
tial variates  using  some  of  Marsaglia's  sam- 
pling procedures  than  it  is  to  generate  them 
by  taking  the  logarithm  of  a_  uniform  variate , 
it  is  faster  to  generate  ordered  uniform  ran- 
dom numbers  by  starting  with  exponential  var- 
iates . 

The  basis  for  this  is  that  if   E...,  i  =  1,  2, 
...n,   denotes  ordered  unit  exponential  vari- 
ates from  a  sample  of  size   n,   and  we  let 
E,n>  =  0,  the  gap  statistics  (Cox  and  Lewis, 
p.  26-27) 
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are  independent  exponentials  with  mean 
E(D(i))  =  (n+l-i)"1 


Thus  if  we  have   n  unit 
exponentials,  generated  say  by  one  of  Marsag- 
lia's schemes,  we  generate 
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and 


U(i)  =  1  -   6XP  iE(i)} 


(i  =  1,  ...n). 
An   n(log  n)   sorting  and  ranking  scheme  is  also 
provided  in  COMPSTAT.  for  sorting  and  ordering  with- 
in the  STATISTICS  GENERATOR. 

c)   Quantiles  and  Percentiles. 

Estimating  quantiles  was  the  second  biggest  bottle- 
neck in  implementing  COMPSTAT.   Quantiles  are  more 
basic  in  characterizing  distributions  than  percen- 
tiles, although,  for  example,  one  is  interested  in 
percentiles  when  evaluating  by  simulation  the  power 
of  a  test  based  on  a  statistic   S.   Thus,  given 
the   a-quantile  x   of   S  under  a  null  hypothesis, 
one  wants  the  percentile  corresponding  to   S  and 
x   under  a  different  hypothesis. 

Percentile  estimation  as  a  binomial  process  is  es- 
sentially straightforward  and  ideal  by  our  criter- 
ion of  simplicity  and  economy  of  computation  and 
memory  requirements.   It  is  also  unbiased.   However, 
it  appears  that  greater  efficiency  should  be  ob- 
tained by  coupling  estimates  at  different   x  's, 
although  I  haven't  been  able  to  do  so.   Most  schemes 
.appear  to  require  assumptions  about  boundedness  of 
the  probability  density  function.   Somerville  (1970) 
has  some  results  in  this  area;  it  appears  to  be  an 
area  for  further  research. 

Quantile  estimation  based  on  order  statistics  is 
advocated  in  most  texts  (see  David,  1971).   For 
large-scale  computation  the  sorting  time  required 
and  the  memory  capacity  is  prohibitive.   Stochastic 
approximation  schemes  (Robbins  and  Monro,  1951; 
Hodges  and  Lehman,  1956)  were  then  tried  but  found 
to  converge  at  an  impossibly  slow  rate  for  large 
quantiles.   These  two  quantile  estimation  schemes 
are  prime  examples  of  statistical  procedures  which 
are  not  attuned  to  computing  realities,  and  whose 
asymptotic  properties  are  deceptive  as  far  as  prac- 
tical applications  are  concerned. 

A  solution  was  finally  found  (Goodman,  Lewis,  and 
Robbins,  1972)  which  combined  the  stochastic  ap- 
proximation with  a  data  transformation.   Typically 


if  the  a-quantile  was  required   (a>0.5),  the 
maxima  of  successive  groups  of  size  v   of  reali- 
zations of   S   are  found.   The  problem  is  then 
one,  if   a'  =  a  ,  of  finding  the  x  ,   quantile, 

which  is  equal  to   x  ,   in  a  distribution  which 

th 
is  the  v —  power  of  the  distribution   S,   F  (x)  . 

By  taking  v  large  enough  to  make  a'   ~  1/2   the 
problem  becomes  one  of  estimating  a  median,  al- 
though other  values  of  v   can  be  used.   Stochastic 

approximations  work  well  with  medians,  but  as  the 

-1/2 
bias  is  apparently  of  order   m     ,   jackknifing 

is  required  to  reduce  the  bias. 

The  present  scheme  (Goodman,  Lewis,  and  Robbins, 
1972)  based  on  the  maximum  transformation  and 
stochastic  approximation  solves  the  basic  pro- 
blems of  quantile  estimation,  but  research  is  con- 
tinuing to  improve  it.   Computationally  it  is  very 
good  since  finding  a  maximum  requires  only  two 
memory  cells  and  computation  time  is  linear  in  m, 
the  number  of  realizations  of   S  which  are  gen- 
erated.  It  is  also  simple  to  compute  in  parallel 
the  quantiles  for  several  levels,  e.g.  a  = 
0.990,  0.995,  0.999. 

D.  Salsburg  has  raised  the  question  as  to  whether 
one  wouldn't  want  to  order  the  data  anyway  to  do, 
for  instance,  a  normal  probability  plot  of  the 
simulated  distribution.   This  may  be  true  for 
samples  of  size  m  equal  to  about  500;  beyond 
that  the  sorting  in  a  large-scale  simulation  be- 
comes onerous,  time-wise  and  memory-wise,  and  I 
feel  a  plot  using  16  quantiles,  as  in  Figure  2, 
plus  the  moments  in  Figure  3,  is  as  good  as  or 
better  than  a  full  probability  plot. 

d)   Bias  and  Bias  Reduction. 

It  is  essential  for  sensible  and  interpretable 
simulation  results  to  have  estimates  of  the  var- 
iances of  the  simulated  quantities.   However, 
sectioning  the  m  replications  in  a  large-scale 
simulation  into   r   sections  of  m'   replications 
to  estimate  the  variance  of  estimates  (see 
Mosteller  and  Tukey ,  1968)  brings  in  problems  of 
bias.   This  is  because  one  wants   r   to  be  about 
10  to  get  reliable  estimates  of  the  variance,  but 

the  resulting  m'   may  be  too  small  to  reduce  the 

10 


bias  in  the  simulated  quantity  to  acceptable  levels. 

This  problem  seems  to  be  well  in  hand  because  of 
the  jackknife  technique  for  bias  reduction  which 
was  developed  by  Quenouille  (1956) ,  pushed  by 
Tukey  (1958)  and  generalized  by  Schucany,  Gray, 
and  Owen  (1971)  and  Gray  and  Schucany  (1972).   A 
similar  technique  was  used  by  Gaver  and  Hoel  (1970) 
in  examining  small-sample  Poisson  probability  es- 
timates.  Some  price  may  be  paid  in  inflation  of 
the  variance  of  the  estimator  (Miller,  1964;  also 
Goodman,  Lewis,  and  Robbins,  1972,  for  a  specific 
case) . 

In  COMPSTAT  the  jackknife  is  quite  simply  incor- 
porated into  the  STATISTICS  GENERATOR. 

e)  Variance  Estimation. 

The  problem  of  bias  appears  to  have  been  alleviated 
directly  by  the  jackknife,  and  indirectly  because 
of  a  suggestion  by  Tukey  (1958)  that  the  sample 
standard  deviation  based  on  the  pseudo-values  in 
the  jackknife  procedure  be  used  to  estimate  the 
variance  of  the  jackknifed  estimate.   There  is 
some  evidence  that  this  procedure  is  broadly 
applicable,  although  Miller  (1968)  pointed  out 
cases  where  it  can  give  poor  results.   In  general, 
n-fold  jackknifing  in  a  small  sample  of  size   n 
can  give  an  estimate  with  a  very  inflated  variance, 
though  this  problem  disappears  as  n+°°.      Relevant 
references  are  Arvesen  (1969) ,  a  review  by  Arvesen 
and  Salsburg  (1972),  and  Mosteller  and  Tukey  (1968) 

The  jackknifing  procedure  will  probably  be  most 
useful  when  available  computation  time  is  too  short 
for  sectioning.   For  a  description  of  variance 
estimation  techniques  based  on  sectioning,  see 
Mosteller  and  Tukey  (1968). 

f )  Variance  Reduction  Techniques. 

I  have  not  discussed  variance  reduction  techniques 
so  far.   An  excellent  review  is  given  by  Gaver 
(1969);  see  also  Hammersley  and  Handscomb  (1964). 
These  variance  reduction  techniques  can  be  imple- 
mented in  COMPSTAT  but  there  seem  to  be  several 
drawbacks,  mainly  that  the  methods  are  particular 
to  the  problems  at  hand.   Thus,  a  large  amount  of 
time  can  be  spend  deriving,  say,  an  antithetic 


variate  for  a  particular  problem  and  this  may,  when 
large  computers  are  available,  be  an  inefficient 
way  to  use  statisticians. 

The  most  important  drawback  to  most  methods,  how- 
ever, is  that  a  method  that  reduces  the  variance 
of  an  estimate  of  the  mean  of  a  statistic   S  will 
often  inflate  the  variance  of  an  estimate  of  the 
variance  of   S.   This  is  clearly  true  for  many 
antithetic  variate  techniques  (Hammersley  and 
Mauldon,  1956)  and  would  be  worse  when  quantiles 
or  percentiles  are  also  required.   This  may  be 
all  right  in  nearly  normal  situations,  but  not  in 
others. 

Much  more  research  is  required  on  variance  reduction 
techniques  that  are  applicable  to  all  aspects  of 
the  characterization  of  a  distribution,  and  are 
easily  derived.   Control  variable  techniques 
(Fieller  and  Hartley,  1959)  seem  to  me  the  best 
candidate  for  this  role. 

An  empirical  control  variable  technique  can  be  im- 
plemented with  COMPSTAT  when  exploration  is  re- 
quired around  a  null  situation.   This  may,  for 
instance,  be  a  test  of  hypothesis  in  which  power 
against  small  deviations  is  of  interest.   Again, 
small  variations  in  scheduling  algorithms  in  com- 
plex queues  might  be  of  interest  to  see  what 
improvement  they  make  to,  say,  throughput  time. 

One  might  then  do  a  very  precise  simulation  of  the 
characterizations  of  the  statistic  under  the  null 
hypothesis.   Fix  m' ,   the  number  of  replications 
per  section,  and  let   r,   the  number  of  sections, 
be  large  and  denote  by   9i^r^   t'ie  estimated 
quantity  under  the  null  hypothesis.   This  will  be 
the  average  of  the  estimates  of   6   from  the   r 
sections.   Results  for  the  sections  are  kept  in 
the  SIDEPUT,  together  with  the  seed  for  the  random 
number  generator  which  initiates  each  section  of 
the  simulation.   The  quantity  is  estimated  under 
alternative  conditions  using  the  same  random  num- 
bers using  only   r'   sections,  where   r1  <<  r. 
Call  this  quantity  W(r').      If   $/r')   is  the 
null  (average)  estimate  from  the  first  r'   sec- 
tions,  (L(r-r')   the  null  (average)  estimate  from 
the  last   r  -  r'   sections,  then  the  control 


variable  estimate  is 


(r1)   =    e£(r')   -    60(r«)   +    6Q(r) 

=    6e(r»)    -    (I=|l)    l^r')   +    (i=El)    T0(r-r') 
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The  common  random  numbers  used  to  generate  the  es- 
timates should  make  the  estimates   6  (r1)   and 
&n(r')   highly  correlated,  and  the  above  equation 
is  the  variance  in  the  usual  control  variable  sit- 
uation except  for  the  last  term.   If   r   is  large 
relative  to  r'   this  last  term  should  be  small 
relative  to  the  other  terms. 

It  is  possible  to  use  subsequent  sections  of  size 
r'   in  the  original  simulation  of   &„   to  explore 
other  alternatives,  say   6  ,  i     ,  ....   There  are 
interesting  design  and  analysis  problems  in  this 
scheme  which  will  be  explored  elsewhere. 

One  final  point  should  be  made  here  about  control 
variables.  Let  6  be  the  uncontrolled  estimate 
and  6  the  controlled  estimate  (generated  from 
the  same  random  numbers).  It  is  not  often  real- 
ized that  even  with  a  regression  adjusted  control 
(see  Gaver,  1969)  the  maximum  attainable  variance 
reduction  is 


var 


&=i-°2. 


var(  5) 

where   P  is  the  correlation  between     and   6. 
It  can  be  very  difficult  and  time-consuming,  es- 
pecially for  the  inexperienced  practitioner,  to 
find  a  control  which  gives  a  high  enough  p   to 
justify  the  pratitioners  time.   And  in  many  cases 
equivalent  speed  ups  can  be  achieved  by  using  more 
efficient  random  number  generators,  ordering  rou- 
tines ,  etc . 

The  time  factor  to  achieve  a  high  p   is  one  rea- 
son for  putting  forward  the  empirical  scheme  above. 

g)   Planning  Simulation  Experiments. 

The  empirical  control  variable  suggestion  in  the 
previous  section  brings  up  the  whole  question  of 


the  design  of  simulation  experiments.   Thus,  it 
would  be  reasonable  to  use  the  empirical  scheme, 
or  plan  an  experiment  around  the  null  value   EL? 
This  would  be  appropriate  if  the  range  of  para- 
meters of  interest  were  known  in  advance.   The 
empirical  control  variable  technique  seems  at- 
tractive as  an  on-line,  interactive  procedure, 
especially  when  estimates  of  the  variances  of  the 
estimates  are  available,  as  in  COMPSTAT.   Some 
formal  analysis  is  still  needed  and  this  could 
be  formidable. 

In  general,  it  would  seem  that  the  output  of 
large-scale  simulation  would  be  a  fertile  field 
for  application  of  techniques  of  analysis  of  var- 
iance and  experimental  design.   I  am  not  familiar 
with  much  by  way  of  specific  applications;  several 
recent  books,  including  that  by  Mihram  (1972), 
which  I  have  not  examined  carefully,  do  treat 
analysis  of  simulation  experiments.   The  tendency, 
however,  does  seem  to  be  to  just  regurgitate  the 
old  theory  without  specifically  worrying  about 
particular  problems  of  simulation  experiments. 

A  simple  case  occurs  when  an  experimenter  has  two 
variance  reduction  techniques  available,  say  two 
control  variables,  and  a  fixed  number  of  repli- 
cations  m  he  can  perform.   He  wants  to  choose 
the  control  variable  which  minimizes  the  variance 
of  the  final  estimate  of  a  parameter,  say   6, 
which  could  be  the  mean  of  a  statistic   S.   If 
m1   is  large  enough  so  that  the  estimates  in  each 
of  the  r  sections   (rm'=m)   are  unbiased  and 
normally  distributed,  this  is  a  classical  two  arm 
bandit  problem. 

I  know  of  no  one,  however,  who  has  actually  done 
this,  probably  because  the  benefit  of  reduced 
variance  doesn't  outweigh  the  extra  cost  of  tool- 
ing up  for  two  estimates  of   G.   It  could  be 
feasible  with  COMPSTAT.   Once  more  than  one  para- 
meter is  involved,  say  the  mean  and  variance  of 
S,  the  problem  is  much  more  complicated.   In 
general  I  think,  however,  that  as  computers  de- 
velop simulation  will  make  many  statistical  prac- 
tices developed  in  vacuo  widely  useful. 

Some  of  many  other  open  design  problems  can  be   -,  j 


seen  by  considering  Figure  2,  where  estimated  quan- 
tiles  of  a  distribution  are  plotted.   One  would 
generally  want  to  smooth  these  plots  or  fit  some 
regression  function  to  assess  the  rate  of  conver- 
gence to  the  asymptotic  normal  distribution.   There 
are  problems  in  that  the  number  of  simulations,   m, 
was  fixed  in  advance  and  thus,  the  variances  at 
each  n  vary.   Moreover,  one  would  want  to  couple 
the  smoothing  or  regression  analysis  of  the  var- 
ious quantiles.   These  are  both  functionally  and 
statistically  correlated   for  each  n  across 
quantiles  and  with  n   for  each  quantile. 

Detailed  analysis  of  such  graphic  output  needs 
much  more  work;  it  is  possible  that  the  work  of 
Efron  and  Morris  (1972)  may  be  relevant  to  this 
problem. 

Besides  the  smoothing,  any  program  such  as  COMPSTAT 
should  provide  facility  for  direct  plotting  of 
output  tables  of  rounded  and  perhaps  smoothed  data. 
This  is  one  facility  computer  scientists  can  pro- 
vide us  with. 

3.   MISCELLANEOUS  PROBLEMS  AND  OPEN  QUESTIONS. 

I  have  not  touched  on  many  questions  in  large-scale 
simulation.   A  few  are  discussed  here  to  emphasize 
that  there  are  many  problems  that  do  not  even  start 
to  fit  on  present  or  future  computers.   Thus,  sim- 
ulation, especially  without  some  analytic  support, 
is  not  always  a  possible  way  out  of  problems,  al- 
though some  people  feel  simulation  is  the  last 
resort.   Other  questions  discussed  below  indicate 
that  there  are  simple  problems  we  cannot  handle. 

a)   Conditional  distributions. 

Conditioning  poses  problems  in  simulations  which 
I  do  not  know  how  to  handle  efficiently.   Thus,  in 
fitting  exponential  polynomials  to  data  from  a  non- 
homogeneous  Poisson  process  (Lewis,  1972)  observed 
for  a  time   t  ,   one  wants  to  condition  on  the 
number,   n,   of  events  observed  in   (0,t  ).   The 

times  to  events   t.   are  then  order  statistics 

l 

from  a  uniform  random  sample  of  size   n.   In  test- 
ing for  a  second  order  term  in  the  polynomial  one 

2 
wants  the  conditional  distribution  of   St.,   given 

n  and   Et . .   Conceptually  this  is  simple  to  see, 


as   Zt.   is  the  distance  from  the  origin  to  the 

n  -  1   dimensional  hyperplane  defined  by  fixing 

2 
Zt..   The  joint  asymptotic  normality  of   Zt   and 
i  l 

Zt.   give  the  result  that  for  large   n,   Zt?/n 
has  a  conditional  normal  distribution  with  mean 
(Lewis,  1972). 

2i   v.  v    .  2   ^i    1 


=  E(Itf|n;[t.)  =  t"   V 
L   i'  L   l     0  nt, 


and  standard  deviation 

a  = 


"0 


1/2  * 


(12nV 
How  does  one  simulate  this  problem  for  small   n 

and  assess  the  rate  of  convergence  to   n?   This 

must  be  a  very  common  problem. 

b)  Multivariate  problems. 

I  have  not  mentioned  simulation  of  multivariate 
statistics  J3.   An  immediate  problem  here  is  that 
quantiles  and  percentiles  are  not  uniquely  defined, 
so  one  has  to  use  joint  moments,  which  could  be 
estimated  in  COMPSTAT,  or  rely  on  probability 
density  functions.   I  have  not  discussed  density 
estimation  here  at  all.   Multivariate  problems, 
of  course,  also  bring  in  new  aspects  of  graphical 
and  tabular  output  which  are  non-trivial. 

c)  Simulated  maximum  likelihood. 

As  a  last  stab,  I  would  like  to  mention  another 
area  which  interests  me.   In  complicated  time 
series  we  now  have  computationally  feasible  tools 
such  as  spectral  analysis  to  help  in  defining  and 
delineating  models.   Once  this  is  done,  however, 
there  are  often  no  reasonable  ways  of  estimating 
parameters  of  the  model,  especially  since  likeli- 
hoods cannot  be  derived,  even  though  the  model  is 
structurally  simple.   It  would  be  useful  to  simu- 
late the  joint  density  of  the  observations  at  the 
observed  data  point  as  a  function  of  the  para- 
meters so  as  to  find  the  maximum  likelihood  es- 
timates of  the  parameters.   I  assume  this  is  worth 
the  cost  to  the  experimenter.   One  then  has  a  more 
complicated  case  of  a) ,  closely  related  to  re- 
sponse surface  designs.   The  solution  seems  to  be 
far  away . 

The  reader  is  referred  to  the  papers  by  Tukey 
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(1972  a,  b)  for  further  problems. 
4.   ACKNOWLEDGEMENTS 

I  wish  to  thank  A.  S.  Goodman  of  the  IBM  Yorktown 
Heights  Research  Center  for  his  long  involvement 
with  the  COMPSTAT  program,  G.  P.  Learmonth  of  the 
Naval  Postgraduate  School  for  his  co-work  on  test- 
ing random  number  generators,  and  D.  P.  Gaver  and 
G.  S.  Shedler  for  stimulating  my  interest  in  these 
problems.   Also  A.  G.  Anderson  for  support  at  IBM 
on  the  COMPSTAT  program. 

5.   REFERENCES 

1)  Ahrens,  J.  H. ,  and  Dieter,  U.  (1972).   Computer 

Methods  for  sampling  from  the  exponential  and 
normal  distributions.   Communications  of  the 
ACM,  15,  873-882. 

2)  Andrews,  D.  F.,  Bickel,  P.  J.,  Hampel,  F.  R. , 

Huber,  P.  J.,  Rogers,  W.  H.  and  Tukey,  J.  W. 
(1972).   Robust  Estimates  of  Location:   Sur- 
vey and  Advances.   Princeton  University  Press: 
Princeton,  New  Jersey. 

3)  Arvesen,  J.  N.  (1969).   Jackknifing  U-statis- 

tics.   Annals  of  Mathematical  Statistics, 
40,  2076-2100. 

4)  Arvesen,  J.  N. ,  and  Salsburg,  D.  S.  (1972). 

Approximate  tests  and  confidence  intervals 
using  the  jackknife.   To  appear  in  Perspec- 
tives in  Biometry,  R.  Elashoff  (ed.).   Aca- 
demic Press,  New  York. 

5)  Bates,  C.  B. ,  and  Zirkle,  J.  A.  (1971). 

Analysis  of  random  numbers  from  four  uniform 
random  number  generators.   Report  TR4-71, 
U.  S.  Army  Combat  Development  Command,  Fort 
Belvoir,  Virginia.   (August,  1971). 

6)  Chambers,  J.  (1970).   Computers  in  Statistical 

Research:   Simulation  and  Computer-Aided 
Mathematics.   Technometrics,  12,  1-16. 

7)  Chambers,  J.  (1971).   Algorithm  410,  Partial 

sorting  [Ml].   Comm.  ACM,  14,  357-8. 

8)  Couveyou,  R.  L. ,  and  MacPherson,  R.  D.  (1969). 

Fourier  analysis  of  uniform  random  number  gen- 
erations.  Journal  of  ACM,  14,  100-119. 


9)   Cox,  D.  R.,  and  Lewis,  P.  A.  W.  (1966).   The 
statistical  analysis  of  series  of  events. 
Methuen:   London  and  Barnes  and  Noble:   New 
York. 

10)  David,  H.  A.  (1971).   Order  Statistics. 

Wiley:   New  York. 

11)  Efron,  B. ,  and  Morris,  C.  (1972).   Limiting 

risk  of  Bayes  and  Empirical  Bayes  Estima- 
tions -  Part  II.   Journal  Am.  Statist. 
Assoc. ,  67,  130-139. 

12)  Fieller,  E.  C,  and  Hartley,  H.  0.  (1959). 

Sampling  with  control  variables.   Biometrika , 
41,  494. 

13)  Freiberger,  W. ,  and  Grenander,  U.  (1971). 

A  short  course  in  computational  probability 
and  statistics.   Springer-Verlag:   Berlin. 

14)  Gaver,  D.  P.,  and  Hoel,  D.  G.  (1970).   Com- 

parison of  certain  small-sample  Poisson 
probability  estimates.   Technometrics ,  12 , 
835-850. 

15)  Gaver,  D.  P.  (1969).   Statistical  methods  for 

improving  simulation  efficiency.   Proc.  3rd 
Annual  Conference  on  Applications  of  Simu- 
lation:  Los  Angeles,  Dec.  1969. 

16)  Goodman,  A.  S.,  Lewis,  P.  A.  W. ,  and  Robbins, 

H.  E.  (1972).   Simultaneous  estimation  of 
large  numbers  of  extreme  quantiles  in  simu- 
lation experiments.   Submitted  to  J.  Amer . 
Statist.  Assoc. 

17)  Gray,  H.  L.,  and  Shucany,  W.  R.  (1972). 

The  generalized  jackknife  statistic.   Marcel 
Dekker :   New  York. 

18)  Halton,  J.  H.  (1970).   A  retrospective  and 

prospective  survey  of  the  Monte  Carlo  method. 
SIAM  Review,  JJ2,  1-63. 

19)  Hammersley,  J.,  and  Handscomb,  D.  C.  (1964), 

Monte  Carlo  Methods.   Methuen:   London. 

20)  Hammersley,  J.  M.  and  Mauldon,  J.  G.  (1956). 

General  principles  of  antithetic  variates. 
Proceedings  of  the  Cambridge  Philosophical 


Society,  52,    476-481. 


21)  Hartley,  H.  0.  (1972).   The  impact  of  computers 

on  statistics.   Proc.  of  Computer  Science  and 
Statistics:   5th  Annual  Symposium  on  the 
Interface.   M.  0.  Locks  (ed . ) .   Western  Per- 
iodical Co.:   Hollywood,  California. 

22)  Hemmerle,  W.  J.  (1967).   Statistical  computa- 

tions on  a  digital  computer.   Blaisdell: 
Waltham,  Massachusetts. 

23)  Hodges,  J.  L.,  and  Lehman,  E.  L.  (1956).   Two 

approximations  to  the  Robbins-Monro  process. 
Proc.  3rd  Berkeley  Symposium  on  Mathematical 
Statistics  and  Probability,  1_,  95-104. 

24)  Isaac,  E.  J.  and  Singleton,  R.  C.  (1956). 

Sorting  by  address  calculation.   Journal  of 
ACM,  3.,  169-174. 

25)  Knuth,  D.  E.  (1969).   Fundamental  Algorithms. 

Addison-Wesley :   Reading,  Massachusetts. 

26)  Lehman,  E.  L.  (1959).   Testing  Statistical 

Hypotheses.   Wiley:   New  York. 

27)  Lewis,  P.  A.  W.  (1972).   Recent  results  in  the 

statistical  analysis  of  univariate  point 
processes.   In  Stochastic  Point  Processes, 
P.  A.  W.  Lewis  (ed.).   Wiley:   New  York. 

28)  Lewis,  P.  A.  W. ,  Goodman,  A.  S. ,  and  Miller, 

J.  M.  (1969).   A  pseudo- random  number  gener- 
ator for  the  System/360.   IBM  Systems 
Journal,  8,  136-146. 

29)  Lewis,  P.  A.  W. ,  and  Goodman,  A.  S.  (1970). 

The  null  distribution  of  the  first  three 
product-moment  statistics  for  exponential, 
half-Gamma,  and  normal  scores.   In  Selected 
Tables  in  Mathematical  Statistics,  Harter, 
H.  L.,  and  Owen,  D.  B.  (eds.).   Markham 
Publishing  Co.:   Chicago,  Illinois. 

30)  Liniger,  W.  (1961).   On  a  method  by  D .  H. 

Lehmer  for  the  generation  of  pseudo-random 
numbers.   Numerische  Mathematik,  _3,  265-270. 

31)  Lurie,  D.,  and  Hartley,  H.  0.  (1972).   Ma- 

chine-generation of  order  statistics  for 
Monte  Carlo  computations.   The  American 
Statistician,  26,  26-29. 


14 


32)  Maisel,  H.,  and  Gnugnoli,  G.  (1972).   Simula- 

tion of  discrete  stochastic  systems.   Science 
Research  Associates,  Inc.:   Chicago. 

33)  Marsaglia,  G.,  and  Bray,  T.  A.  (1968).   One- 

line  random  number  generators  and  their  use 
in  combinations.   Comm.  ACM,  11,  757-759. 

34)  Marsaglia,  G.  (1968).   Random  numbers  fall 

mainly  in  the  planes.   Proc .  National  Academy 
of  Sciences,  61 ,  25-28. 

35)  Marsaglia,  G.  (1972).   The  structure  of  linear 

congruential  sequences.   In  Applications  of 
Number  Theory  in  Numerical  Analysis,  S.  K. 
Zaremba  (ed.).   Academic  Press:   New  York. 

36)  Martin,  W.  A.  (1971).   Sorting.   Computing 

Surveys ,  _3,  147-74. 

37)  Mihram,  G.  A.  (1972).   Simulation:   Statisti- 

cal Foundations  and  Methodology.   Academic 
Press :   New  York. 

38)  Miller,  R.  G.,  Jr.  (1964).   A  trustworthy 

jackknife.   Annals  of  Mathematical  Statistics 
J35,  1594-1605. 

39)  Miller,  R.  G. ,  Jr.  (1968).   Jackknifing  var- 

iances.  Annals  of  Mathematical  Statistics, 
39,  567-582. 

40)  Milton,  R.  C,  and  Nelder,  J.  A.  (1969). 

Statistical  Computation.   Academic  Press: 
New  York. 

a 

41)  Mosteller,  F.  ,  and  Tukey,  J.  W.  (1968). 

Data  analysis,  including  statistics.   Hand- 
book of  Social  Psychology,  2nd  ed . ,  V.  2. 
G.  Lindzey  and  E.  Aronson  (eds.).   Addison- 
Wesley:   Reading,  Mass.  80-203. 

42)  Nance,  R.  E.,  and  Overstreet,  C.  (1972). 

A  bibliography  on  random  number  generation. 
Computing  Reviews  (Bibliography,  29)  13, 
495-508. 

43)  Newman,  T.  G.,  and  Odell,  P.  L.  (1971).   The 

generation  of  random  deviates.   Griffin: 
London  and  Hafner:   New  York. 


44)  Payne,  W.  H.,  Rabung,  J.  R.,  and  Bagyo,  T.  P. 

(1969).   Coding  the  Lehman  pseudo-random 
number  generator.   Comm.  of  ACM,  12,  85-86. 

45)  Quenouille,  M.  H.  (1956).   Notes  on  bias  in 

estimation.   Biometrika,  43,  353-360. 

46)  Robbins,  H.  E.,  and  Monro,  S.  (1951).   A  stoch- 

astic approximation  method.   Anna  1 s  of  Ma t he- 
matical  Statistics,  22,  400-407. 

47)  Schucany,  W.  R.  (1972).   Order  statistics  in 

simulation.   J.  Statist.  Comput.  Simul.  ,  1, 
281-286. 

48)  Schucany,  W.  R.,  Gray,  H.  L.,  and  Owen,  D.  B. 

(1971).   On  bias  reduction  in  estimation. 
Journal  Amer.  Statist.  Assoc. ,  6^,  524-533. 

49)  Somerville,  P.  N.  (1970).   A  technique  for  the 

computation  of  percentage  points  of  a  statis- 
tic.  Technometrics,  12,  373-382. 

50)  Tausworthe,  R.  C.  (1965).   Random  numbers  gen- 

erated by  linear  recurrence  modulo  two. 
Math  Comp. ,  19,  201-209. 

51)  Toothill,  J.  P.  R. ,  Robinson,  W.  D.,  and  Adams, 

A.  G.  (1971).   The  runs  up-and-down  perfor- 
mance of  Tausworthe  pseudo-random  number 
generators.   Journal  of  ACM,  1_8,  381-399. 

52)  Tukey,  J.  W.  (1958).   Bias  and  confidence  in 

not-quite  large  samples.   Annals  of  Mathe- 
matical Statistics,  (abstract),  29,  614. 

53)  Tukey,  J.  W.  (1972a).  Lags  in  statistical 

technology.   Canadian  Journal  of  Statistics. 

54)  Tukey,  J.  W.  (1972b).  How  computing  and 

statistics  affect  each  other.   To  appear. 

Note  added  in  proof:   I  am  indebted  to  Dr.  J.  P.  C. 
Kleijnen  for  pointing  out  some  references  on  the 
design  and  analysis  of  computer  simulation  exper- 
iments. 

Naylor,  T.  H.  (ed.)  (1969).   The  design  of 
computer  simulation  experiments.   Duke  Uni- 
versity Press.   Durham,  N.  C. 


15 


Kleijnen,  J.  P.  C,  Naylor,  T.  H.,  and  Seaks , 
T.  G.   (1972).   The  use  of  multiple  ranking 
procedures  to  analyze  simulations  of  manage- 
ment systems:  a  tutorial  MANAGEMENT  SCIENCE, 
APPLICATION  SERIES,  18,  245-257. 

Kleijnen,  J.  P.  C.  (1972).  The  statistical 
design  and  analysis  of  computer  simulation 
experiments:  a  survey.  MANAGEMENT  INFOR- 
MATICS, 1,  57-66. 


16 


INITIAL  DISTRIBUTION  LIST 

No.  Copies 

Defense  Documentation  Center  12 

Cameron  Station 

Alexandria,  Virginia    22314 

Dean  of  Research  (Code  023)  1 

Naval  Postgraduate  School 
Monterey,  California   93940 

Library  (Code  0212)  2 

Naval  Postgraduate  School 
Monterey,  California   93940 

Library  (Code  55)  3 

Department  of  Operations  Research 

and  Administrative  Sciences 
Naval  Postgraduate  School 
Monterey,  California    93940 

Professor  D.  R.  Cox  1 

Department  of  Mathematics 
Imper'il  College 
London,  England 

Professor  B.  Epstein  1 

Israel  Institute  of  Technology 
Technion  City 
Haifa,  Israel 

Professor  P.  A.  P.  Moran  1 

Department  of  Statistics 

Australian  National  University 

P.  0.  Box  4 

Canberra,  Australia   2600 

Professor  Daryl  Daley  1 

Department  of  Statistics 

Australian  National  University 

P.O.  Box  4 

Canberra,  Australia    2600 

Marvin  Denicoff  1 

Office  of  Naval  Research 
Arlington,  Virginia    22217 

Dr .  Thomas  Varley  1 

Office  of  Naval  Research 
Arlington,  Virginia   22217 


17 


Dr.  Bruce  McDonald  1 

Office  of  Naval  Research 
Arlington,  Virginia   22217 

Professor  Walter  Smith  1 

Statistics  Department 
University  of  North  Carolina 
Chapel  Hill,  North  Carolina   27514 

Dr.  Glenn  Bacon  1 

Systems  Department 

IBM  Research 

Monterey  and  Cottle  Roads 

San  Jose,  California   95100 

Professor  Leonard  Kleinrock  1 

Department  of  Computer  Science 

University  of  California 

Los  Angeles,  California   90024 

Dr.  Bill  Mitchell  1 

Department  of  Management  Sciences 
California  State  College 
Hayward ,  California   94542 

Dr.  Donald  P.  Gaver  1 

Department  of  Operations  Research 

and  Administrative  Sciences 
Naval  Postgraduate  School 
Monterey,  California  93940 

David  Amsbury  1 

NASA  Manned  Spacecraft  Center 
Houston,  Texas   77058 

Richard  V.  Andree  1 

Department  of  Information  and 

Computer  Sciences 
University  of  Oklahoma 

Julius  Aronofsky  1 

University  Computing  Corporation 
1500  UCC  Tower 
P.  0.  Box  6228 
Dallas,  Texas  75222 

George  Atkins  -*- 

Department  of  Computer  Science 
Southwestern  State  College 
Weatherford,  Oklahoma   73096 


18 


Charles  Bacon 
Department  of  Engineering 
Oklahoma  State  University 
Stillwater,  Oklahoma   74074 

T.  E.  Bailey 

Computing  and  Information  Systems 
Oklahoma  State  University 
Stillwater,  Oklahoma   74074 

Lee  J.  Bain 

Math  Department 

University  of  Missouri  at  Rolla 

Rolla,  Missouri 

Jeff  Bangert 
Computation  Center 
Kansas  University 
Lawrence,  Kansas   66044 

Thomas  J.  Boardman 
Statistics  Department 
Colorado  State  University 
Ft.  Collins,  Colorado   80521 

R.  B.  Breitenbach 

College  of  Business  Administration 
Oklahoma  State  University 
Stillwater,  Oklahoma  74074 

Lyle  Broemeling 
Department  of  Math  and  Science 
Oklahoma  State  University 
Stillwater,  Oklahoma   74074 

Gary  Carlson 
Computer  Research  Center 
Brigham  Young  University 
Provo,  Utah   84601 

John  P.  Chandler 

Computing  and  Information  Systems 
Oklahoma  State  University 
Stillwater,  Oklahoma   74074 

Larry  Claypool 

Department  of  Math  and  Science 
Oklahoma  State  University 
Stillwater,  Oklahoma   74074 


19 


Claude  Cohen 
Computer  Center 
Northwestern  University 
2129  Sheridan  Road 
Evanston,  Illinois   60201 

Eli  Cohen 

Vogelback  Computer  Center 
2129  Sheridan 
Northwestern  University 
Evanston,  Illinois   60201 

Herbert  T.  Davis 
Department  of  Mathematics 
University  of  New  Mexico 
Albuquerque,  New  Mexico   87106 

Arthur  D.  Dayton 
Department  of  Statistics 
Kansas  State  University 
Manhattan,  Kansas   66502 

Hamed  Eldin 

Department  of  Industrial  Engineering 
Oklahoma  State  University 
Stillwater,  Oklahoma   74074 

Professor  William  F.  Fellner 
University  of  California 
Irvine,  California  92664 

Donald  Fisher 

Computing  Information  Science  Dept. 
Oklahoma  State  University 
Stillwater,  Oklahoma   74074 

J.  L.  Folks 

Department  of  Math  and  Science 
Oklahoma  State  University 
Stillwater,  Oklahoma   74074 

Alan  S.  Galbraith 

Math  Division 

U.S.  Army  Research  Office 

Box  CM 

Duke  Station 

Durham,  North  Carolina   27706 

Charles  E.  Gates 
Institute  of  Statistics 
Texas  A  &  M  University 
College  Station,  Texas   77843 


20 


Philip  J.  Gensler 

Department  of  Computing  Information  Systems 

West  Texas  State  University 

Canyon,  Texas   79015 

Robert  M.  Gordon 
University  of  California 
Irvine,  California   92664 

Dennis  E.  Grawoig 
Georgia  State  University 
33  Gilmer  Street,  N.E. 
Atlanta,  Georgia   30303 

Robert  Gumm 

Computer  Center 

Oklahoma  State  University 

Stillwater,  Oklahoma   74074 

Professor  H.  0.  Hartley 
Texas  A  &  M  University 
College  Station,  Texas   77840 

G.  E.  Hedrick 

Computer  Information  Science  Department 

Oklahoma  State  University 

Stillwater,  Oklahoma   74074 

William  G.  Hunter 
Department  of  Statistics 
University  of  Wisconsin 
Madison,  Wisconsin   53706 

Kaye  Ready 
Medical  Center 
106  Gilbert  Drive 
University  of  Arkansas 
Little  Rock,  Arkansas   72205 

William  J.  Kennedy 
Statistical  Lab 
Iowa  State  University 
Ames,  Iowa   50010 

Ignacy  Kotlarski 

Department  of  Math  and  Statistics 
Oklahoma  State  University 
Stillwater,  Oklahoma   74074 


21 


Ronald  McNew 

Department  of  Math  and  Statistics 
Oklahoma  State  University 
Stillwater,  Oklahoma  74074 

Paul  D.  Minton 
Department  of  Statistics 
Southern  Methodist  University 
Dallas,  Texas   75222 

Robert  Morrison 

Department  of  Math  and  Statistics 
Oklahoma  State  University 
Stillwater,  Oklahoma   74074 

Billie  J.  Pease 
Department  of  the  Interior 
Geological  Survey 
Computer  Center  Division 
18th  and  C  Streets,  N.W. 
Room  1452 
Washington,  D.  C.   20242 

William  S.  Peters 
University  of  New  Mexico 
Albuquerque,  New  Mexico   87106 

Norval  F.  Pohl 

Department  of  Quantative  Methods 
University  of  Santa  Clara 
Santa  Clara,  California 

Julius  Reichbach 

Technological  University 

Mathematics  Room  2004 

Holland,  Delft  8 

Julianalaam  132,  The  Netherlands 

David  P.  Rutten 
Graduate  School  of  Business 
Indiana  University 
Bloomington,  Indiana   47401 

Brian  Schott 

Georgia  State  University 

Atlanta,  Georgia   30303 

David  J.  Schumacher 

Lockheed  Missiles  and  Space  Company 

Sunnyvale,  California   94088 


22 


Dan  Scott 

North  Texas  State  University 

P.  0.  Box  13886 

Denton,  Texas   76203 

W.  T.  Stille 
Eastman  Kodak 
343  State  Street 
Rochester,  New  York 

Sundaram  Swetharanyam 

Main  Campus  Computer  Center 

McNeese  State  University 

Lake  Charles,  Louisiana   70601 

Jack  Testerman 

University  of  Southwestern  Louisiana 

Box  940 

Lafayette,  Louisiana   70501 

Carolyn  S.  Thompson 
Medical  Center 
University  of  Arkansas 
Little  Rock,  Arkansas   72201 

W.  M.  Usher 

O.S.U.  Computing  Center 
Oklahoma  State  University 
Stillwater,  Oklahoma   74074 

James  Van  Doren 

Computing  Information  Sciences  Department 

Oklahoma  State  University 

Stillwater,  Oklahoma   74074 

David  Weeks 

Department  of  Math  and  Science 
Oklahoma  State  University 
Stillwater,  Oklahoma   74074 

Eric  N.  West 
University  of  Alberta 
Edmonton,  Canada 

Wray  Wilkes 
Computing  Center 
University  of  Arkansas 
Fayetteville ,  Arkansas 

Sing--chou  Wu 

Computer  Science  and  Statistics  Department 

California  Polytechnic  College 

San  Lius  Obispo,  California 


23 


Geoffrey  Gates 
400  Computer  Center 
Michigan  State  University 
E.  Lansing,  Michigan   48823 

W.  V.  Accola 

Computer  Center 

Oklahoma  State  University 

Stillwater,  Oklahoma   74074 

Kenneth  Bray 

905  Asp 

University  of  Oklahoma 

Norman.  Oklahoma   73069 

Professor  Amrit  L.  Goel 
427  Linil  Hall 
Syracuse  University 
Syracuse,  New  York   13210 

Y.  C.  Lu 

College  of  Agriculture 
Oklahoma  State  University 
Stillwater,  Oklahoma   74074 

John  W.  Meredith 

Stephen  F.  Austin  State  University 

Box  6163 

Nacogdoches,  Texas   75961 

Patrick  L.  Odell 
Texas  Technical  University 
Department  of  Mathematics 
Lubbock,  Texas   79409 

John  M.  Chambers 

Bell  Telephone  Laboratories 

Murray  Hill,  New  Jersey    97974 

Prof.  W.  Morven  Gentlemen 
Dept.  Computer  Science 
University  of  Waterloo 
Waterloo 
Ontario,  CANADA 

Prof.  W.  J.  Hemmerle 
Computer  Lab. 

University  of  Rhode  Island 
Kingston,  Rhode  Island    02881 


24 


Dr.  M.  Schatzoff 

IBM  Cambridge  Scientific  Center 

IBM  Corporation 

545  Technology  Square 

Cambridge,  MA  02139  1 

Prof.  J.  W.  Tukey 

Department  of  Statistics 

Princeton  University 

Princeton,  New  Jersey    08540  1 

Professor  G.  Marsaglia 

School  of  Computer  Science 

McGill  University 

P.  0.  Box  6070 

Montreal  101,  P.  Q.   CANADA  1 

Professor  Walter  Freiberger 

Director,  Center  for  Computer  Science 

Brown  University 

George  Street 

Providence,  Rhode  Island    02912  1 

Professor  J.  N.  Arvesen 

Dept .  of  Statistics 

Purdue  University 

Lafayette,  Indiana    47901  1 

Prof .  P .  A.  W.  Lewis 
Department  of  Operations  Research 
and  Administrative  Sciences 
Naval  Postgraduate  School 
Monterey,  CA  93940  25 


25 


This  page  left  purposely  blank 


26 


Security  Classification 


twrxerr-"**— wnw   ..•*****nr--  »^*.- -*/^  .r   -r-_-        -       •n— vrn  ■?.— -».: 


DOCUMENT  COHYUOL  DATA  -R&D 

(Steurlly   clattlfleitlon   of  till*,    body  el  0bit:m<-l  and  lnd**lr.1  »ino.\-»/?n  dj<I  fc»   tnfcrjj  H'han   Mi»   overall  rtporl  It   f  bx/fiorfj 

j«.  r:pokt  iicu^ity  claiuficatign 
Unclassified 


I  .    OKKINATINS    ACTIVITY    (Cor?OT»t*   MutAOT) 

Naval  Postgraduate  School 
Monterey,  California 


1 


2b.  OROU* 


1      REPORT     TITLI 

Large-Scale   Computer -Aided    Statistical  Mathematics 


4.   DESCRIPTIVE  notci  (Typ*  ot  ropsrt  «nd,/-ic(:jo/ vt  6mj 

Technical  Report 


I     AUTHOnUlfr'/MlfUs:!,  b/lw(»  Millil,  lt»t  na/r.t) 


Peter   A.    W.    Lewis 


•  ■  utMom  DATE 

November    197  2 


7a.    TOTAL    NO.    OF    PACK.* 


76.  no.  or  r:pi 

54 


la.    CONTRACT    On    CHANT    NO. 


b.    PROJECT  NO. 


»*.    ORIGINATOR"!   REPORT   NUMDER(l) 


NPS55LW72111A 


Si.   OTHta   fiiPOR  T  NO (8)  (Any  othtr  nuz-btrt   LSai  iu/  bt  tttirnid 
•hit  report) 


d. 


10.    OIBTfll  OUTION    IT*TtW*NT 


Approved  for  public  release;  distribution  unlimited 


II.    SUPPLEMENTARY    NOTE* 


12.    IPONSOrilNG    MILITARY    ACTIVITY 


II.    AUlTRACT 


Some  thoughts  on  large-scale  computer-aided  statistical  mathematics 
(primarily  simulation)  which  were  presented  at  the  6th  Annual  Conference 
on  the  Computer  Science/Statistics  Interface  conference  are  presented. 
Comments  of  participants  and  panelists  (D .  F.  Andrews,  J.  N.  Arvesen, 
D.  P.  Gaver,  and  G.  Marsaglia)  have  been  added  to  the  original  text. 


If*   OIOI-207-03H 


■CTJ-rrr-'r;'  I  — COT!  — i  uji'-xaaajauiia-^ars 


»,:?ij."»C7  CjjDiiu^j-'.n 


t-IUM 


Si  .iirit\    OlmmficaMon 


K  Cv    wo  m o» 


Computers 

Simulation 

Quantiles 

Jackknif e 

Variance  reduction  techniques 

Sorting 

Statistical  mathematics 

Multiprogrammed  computers 

Percentiles 

Random  number  generation 

Multiplicative  congruential  generators 

Ordering 

Bias  reduction 


LINK      A 


WOlt 


«OL  E 


hm  ti 


DD  ,F.f„?..1473 

S/N  0101-807-8821 


(BACK) 


Security  Classification 


«  -  3  I  409 


U151350 


DUDLEY  KNOX  LIBRARY  -  RESEARCH  REPORTS 


5  6853  01058185  3 


